Clean directory and load packages
Helper function for converting a list of sillyvec on clipboard to separated by “;” for OceanAK lookup and pasted back to clipboard
sillyvec
[1] "PAFK94H" "PAFK14" "PAFK91H" "PAFK13" "PCANN14" "PCANN91H" "PCANN15" "PVFDA94H" "PVFDA14" "PVFDE91H" "PVFDA15" "PDUCK94T"
[13] "PDUCK14E" "PDUCK14L" "PERBE91T" "PERBL91T" "PERB17" "PGREG94" "PGREG14E" "PGREG14L" "PGREG93U" "PSIWAS15" "PSPRING14" "PSPRING15"
The objective of this notebooks is to create an extraction list for our collaborative PWS Pink Salmon whole genome re-sequencing (WGR) project with the Christie Lab at Purdue University. The study design for this extraction list comes from:
V:\Documents\5_Coastwide\Multispecies\AHRP\Pink Salmon Disaster Funding\Round2\Objective 11 PWS WGR\Sample Units.xlsx
Sheet 2
This project is using leftover Pink Salmon Disaster 2016 funds to try to address questions about potential genetic mechanisms causing reduced RRS. This project is broken into two main questions:
Screenshot of sample design
Using “output/PWS Pink Salmon WGR Extraction List.xlsx” and “output/extraction_selection.xlsx” to hand-pick most extractions
Updating to replace Gregorieff Creek 1993 and Siwash Creek 2015 as the VFDA brood source collections for the odd-lineage
Replacing with Duck River 191 and 2013. Duck River was not the brood source for VFDA, but it is the closest stream distance-wise. Gregorieff 1993 only had 16 alevin samples and Siwash 2015 did not have paired otolith data (so probably a bunch of strays mixed in).
Need the tissue information from LOKI tissue table Need the sex data from LOKI fish table Need the otolith data from warehouse
Pick fish and format for extraction (use PWS Pink as a template)
DWP information and tissue missing data lives here…
(loki_tissue_og <- readr::read_csv(file = "../data/LOKI_tissue_GEN_SAMPLED_FISH_TISSUE.csv"))
Warning: One or more parsing issues, see `problems()` for details
Rows: 19639 Columns: 32
-- Column specification -------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (11): Silly Code, PK_TISSUE_TYPE, CAPTURE_LOCATION, DNA_TRAY_CODE, DNA_TRAY_WELL_POS, STORAGE_ID, UNIT, SLOT, EXHAUSTED_HOW,...
dbl (9): FK_COLLECTION_ID, FK_FISH_ID, LATITUDE, LONGITUDE, CONTAINER_ARRAY_TYPE_ID, DNA_TRAY_WORKBENCH_ID, DNA_TRAY_WELL_CODE,...
lgl (9): MESH_SIZE, MESH_SIZE_COMMENT, IS_MISSING_PAIRED_DATA_EXISTS, WELL_HAS_MORE_THAN_ONE_SAMPLE, IS_PRESENT_IN_DATASHEET, I...
date (3): CAPTURE_DATE, END_CAPTURE_DATE, EXHAUSTED_DATE
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Modify, filter for silly codes of interest and remove any fish with knwon missing tissues
(
loki_tissue <- loki_tissue_og %>%
dplyr::filter(
`Silly Code` %in% c("PDUCK14E", "PDUCK14L", "PGREG14E", "PGREG14L"),
PK_TISSUE_TYPE == "Heart",
is.na(IS_MISSING_PAIRED_DATA_EXISTS),
is.na(WELL_HAS_MORE_THAN_ONE_SAMPLE)
) %>%
dplyr::rename(
silly = `Silly Code`,
fish_id = FK_FISH_ID,
tissue_type = PK_TISSUE_TYPE,
dwp_barcode = DNA_TRAY_CODE,
dwp_well = DNA_TRAY_WELL_CODE
) %>%
tidyr::unite(
col = "silly_source",
c(silly, fish_id),
sep = "_",
remove = FALSE
) %>%
dplyr::select(
silly,
fish_id,
silly_source,
tissue_type,
dwp_barcode,
dwp_well
)
)
Sex data lives here…
(loki_fish_og <- readr::read_csv(file = "../data/LOKI_fish_ASL Import.csv"))
Rows: 16927 Columns: 11
-- Column specification -------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (2): Silly Code, Sex
dbl (6): Collection ID, Fish ID, Freshwater Age, Ocean Age, Scale Card Number, Scale Card Position
lgl (3): Length, Weight, ASL Number
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Modify, filter for silly codes of interest
(
loki_fish <- loki_fish_og %>%
dplyr::filter(
`Silly Code` %in% c("PDUCK14E", "PDUCK14L", "PGREG14E", "PGREG14L")
) %>%
dplyr::rename(
silly = `Silly Code`,
fish_id = `Fish ID`,
sex = Sex
) %>%
tidyr::unite(
col = "silly_source",
c(silly, fish_id),
sep = "_",
remove = TRUE
) %>%
dplyr::select(silly_source, sex)
)
Otolith read data lives here…
(oceanak_og <- readr::read_csv(file = "../data/Duck 2014 and Gregorieff 2014 AHRP Salmon Biological Data 20220318_160209.csv"))
Rows: 1099 Columns: 22
-- Column specification -------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (6): SILLY_CODE, TISSUE_TYPE, LOCATION_CODE, OTOLITH_MARK_PRESENT, OTOLITH_MARK_ID, OTOLITH_MARK_STATUS_CODE
dbl (8): COLLECTION_ID, FISH_ID, DNA_TRAY_CODE, DNA_TRAY_WELL_CODE, SAMPLE_ID, SAMPLE_YEAR, IS_MISSING_PAIRED_DATA_EXISTS, WELL_...
lgl (7): SEX, LENGTH_MM, TARGET_DNA_TRAY_CODE, TARGET_DNA_TRAY_WELL_POS, TARGET_CONTAINER_ARRAY_TYPE_ID, CONTAINER_ARRAY_TYPE, D...
dttm (1): SAMPLE_DATE
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Modify, filter for silly codes of interest and tissue type “Otolith”
Join all three of these data sources into something useful, then filter by our collections (Duck and Gregorioff 2014), non-missing tissue, natural-origin, split evenly across silly codes and sexes.
(join_duck_greg <- loki_tissue %>%
left_join(loki_fish, by = "silly_source") %>%
left_join(oceanak, by = "silly_source")
)
From each silly, grab the first 9 fish from each sex that are not otolith marked (i.e. natural-origin fish). Total of 18 fish per silly, 36 fish per sampling unit (early and late are still one sampling unit).
(
extraction_duck_greg <- join_duck_greg %>%
dplyr::filter(otolith_mark_present == "NO",
sex != "U") %>%
dplyr::group_by(silly, sex) %>%
dplyr::slice_min(order_by = fish_id, n = 9) %>%
dplyr::ungroup() %>%
dplyr::arrange(silly, fish_id) %>%
dplyr::select(
silly_source,
silly,
fish_id,
dwp_barcode,
dwp_well,
tissue_type,
sex,
otolith_mark_present
)
)
Write it out for posterity
readr::write_csv(x = extraction_duck_greg, file = "../output/extraction_selection_PDUCK14E_PDUCK14L_PGREG14E_PGREG14L.csv")
DWP information and tissue missing data lives here…
(loki_tissue_og <- readr::read_csv(file = "../data/LOKI_tissue_GEN_SAMPLED_FISH_TISSUE.csv"))
Warning: One or more parsing issues, see `problems()` for details
Rows: 19639 Columns: 32
-- Column specification -------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (11): Silly Code, PK_TISSUE_TYPE, CAPTURE_LOCATION, DNA_TRAY_CODE, DNA_TRAY_WELL_POS, STORAGE_ID, UNIT, SLOT, EXHAUSTED_HOW,...
dbl (9): FK_COLLECTION_ID, FK_FISH_ID, LATITUDE, LONGITUDE, CONTAINER_ARRAY_TYPE_ID, DNA_TRAY_WORKBENCH_ID, DNA_TRAY_WELL_CODE,...
lgl (9): MESH_SIZE, MESH_SIZE_COMMENT, IS_MISSING_PAIRED_DATA_EXISTS, WELL_HAS_MORE_THAN_ONE_SAMPLE, IS_PRESENT_IN_DATASHEET, I...
date (3): CAPTURE_DATE, END_CAPTURE_DATE, EXHAUSTED_DATE
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Modify, filter for silly codes of interest and remove any fish with knwon missing tissues
(
loki_tissue <- loki_tissue_og %>%
dplyr::filter(
`Silly Code` %in% c("PERBE91T", "PERBL91T"),
PK_TISSUE_TYPE == "Heart",
is.na(IS_MISSING_PAIRED_DATA_EXISTS),
is.na(WELL_HAS_MORE_THAN_ONE_SAMPLE)
) %>%
dplyr::rename(
silly = `Silly Code`,
fish_id = FK_FISH_ID,
tissue_type = PK_TISSUE_TYPE,
dwp_barcode = DNA_TRAY_CODE,
dwp_well = DNA_TRAY_WELL_CODE
) %>%
tidyr::unite(
col = "silly_source",
c(silly, fish_id),
sep = "_",
remove = FALSE
) %>%
dplyr::select(
silly,
fish_id,
silly_source,
tissue_type,
dwp_barcode,
dwp_well
)
)
Wait, what? No tissues for PERBL91T….
Cool, no known missing, grab first 36 from PERBE91T until we figure out PERBL91T.
Need the tissue information from LOKI tissue table Need the sex and otolith data from warehouse
Pick fish and format for extraction (use PWS Pink as a template)
DWP information and tissue missing data lives here…
(loki_tissue_og <- readr::read_csv(file = "../data/LOKI_tissue_GEN_SAMPLED_FISH_TISSUE_PERB17_20220321_150821.csv"))
Warning: One or more parsing issues, see `problems()` for details
Rows: 14955 Columns: 47
-- Column specification -------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (14): SILLY_CODE, PK_TISSUE_TYPE, CAPTURE_LOCATION, STORAGE_ID, UNIT, DNA_TRAY_CODE, DNA_TRAY_WELL_POS, REGION_CODE, QUADRAN...
dbl (14): FK_COLLECTION_ID, FK_FISH_ID, SHELF_RACK, SLOT, LATITUDE, LONGITUDE, DNA_TRAY_WELL_CODE, IS_MISSING_PAIRED_DATA_EXISTS...
lgl (17): VIAL_BARCODE, EXHAUSTED_HOW, EXHAUSTED_BY, EXHAUSTED_DATE, MESH_SIZE, MESH_SIZE_COMMENT, AGENCY, COLLECTION_DATE, OTHE...
dttm (2): CAPTURE_DATE, END_CAPTURE_DATE
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Modify, filter for silly codes of interest and remove any fish with known missing tissues
erb_2017_oto_fish_ids <-
dplyr::filter(.data = loki_tissue_og, PK_TISSUE_TYPE == "Otolith") %>% pull(FK_FISH_ID)
(
loki_tissue <- loki_tissue_og %>%
dplyr::filter(
FK_FISH_ID %in% erb_2017_oto_fish_ids,
PK_TISSUE_TYPE == "Heart-bulbus arteriosus",
IS_MISSING_PAIRED_DATA_EXISTS == 0,
WELL_HAS_MORE_THAN_ONE_SAMPLE == 0
) %>%
dplyr::rename(
silly = SILLY_CODE,
fish_id = FK_FISH_ID,
tissue_type = PK_TISSUE_TYPE,
dwp_barcode = DNA_TRAY_CODE,
dwp_well = DNA_TRAY_WELL_CODE
) %>%
tidyr::unite(
col = "silly_source",
c(silly, fish_id),
sep = "_",
remove = FALSE
) %>%
dplyr::select(
silly,
fish_id,
silly_source,
tissue_type,
dwp_barcode,
dwp_well
)
)
Sex and otolith read data lives here…
(oceanak_og <- readr::read_csv(file = "../data/Erb 2017 AHRP Salmon Biological Data 20220321_151027.csv"))
Rows: 14955 Columns: 22
-- Column specification -------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (9): SILLY_CODE, SEX, TISSUE_TYPE, DNA_TRAY_CODE, LOCATION_CODE, SAMPLE_ID, OTOLITH_MARK_PRESENT, OTOLITH_MARK_ID, OTOLITH_M...
dbl (7): COLLECTION_ID, FISH_ID, LENGTH_MM, DNA_TRAY_WELL_CODE, SAMPLE_YEAR, IS_MISSING_PAIRED_DATA_EXISTS, WELL_HAS_MORE_THAN_O...
lgl (5): TARGET_DNA_TRAY_CODE, TARGET_DNA_TRAY_WELL_POS, TARGET_CONTAINER_ARRAY_TYPE_ID, CONTAINER_ARRAY_TYPE, DETERMINATION_COL...
dttm (1): SAMPLE_DATE
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Modify, filter for silly codes of interest and tissue type “Heart-bulbus arteriosus”
(
oceanak <- oceanak_og %>%
dplyr::filter(
TISSUE_TYPE == "Heart-bulbus arteriosus"
) %>%
dplyr::rename(
silly = SILLY_CODE,
fish_id = FISH_ID,
tissue_type = TISSUE_TYPE,
dwp_barcode = DNA_TRAY_CODE,
dwp_well = DNA_TRAY_WELL_CODE,
sample_date = SAMPLE_DATE,
sex = SEX,
otolith_mark_present = OTOLITH_MARK_PRESENT
) %>%
dplyr::mutate(sample_date = lubridate::ymd(sample_date)) %>%
tidyr::unite(
col = "silly_source",
c(silly, fish_id),
sep = "_",
remove = FALSE
) %>%
dplyr::select(
silly_source,
sample_date,
sex,
otolith_mark_present
)
)
Join all two of these data sources into something useful, then filter by our collections (Duck and Gregorioff 2014), non-missing tissue, natural-origin, split evenly across silly codes and sexes.
(join_duck_greg <- loki_tissue %>%
dplyr::left_join(oceanak, by = "silly_source")
)
From each silly, grab the first 9 fish from each sex that are not otolith marked (i.e. natural-origin fish). Total of 18 fish per silly, 36 fish per sampling unit (early and late are still one sampling unit).
(
extraction_erb_2017 <- join_duck_greg %>%
dplyr::filter(otolith_mark_present == "NO",
sex != "U") %>%
dplyr::group_by(silly, sex) %>%
dplyr::slice_sample(n = 18) %>%
dplyr::ungroup() %>%
dplyr::arrange(silly, fish_id) %>%
dplyr::select(
silly_source,
silly,
fish_id,
dwp_barcode,
dwp_well,
tissue_type,
sample_date,
sex,
otolith_mark_present
)
)
Write it out for posterity
readr::write_csv(x = extraction_erb_2017, file = "../output/extraction_selection_PERB17.csv")
Got sex data from Wei on 3/21/22
Need the tissue information from LOKI tissue table Need the sex data from LOKI fish table Need the otolith data from warehouse
Pick fish and format for extraction (use PWS Pink as a template)
DWP information and tissue missing data lives here…
(loki_tissue_og <- readr::read_csv(file = "../data/LOKI_tissue_GEN_SAMPLED_FISH_TISSUE_PDUCK13_20220322_103543.csv"))
Rows: 399 Columns: 47
-- Column specification -------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (15): SILLY_CODE, PK_TISSUE_TYPE, CAPTURE_LOCATION, EXHAUSTED_HOW, EXHAUSTED_BY, STORAGE_ID, UNIT, SLOT, DNA_TRAY_WELL_POS, ...
dbl (12): FK_COLLECTION_ID, FK_FISH_ID, SHELF_RACK, LATITUDE, LONGITUDE, DNA_TRAY_CODE, DNA_TRAY_WELL_CODE, IS_MISSING_PAIRED_DA...
lgl (16): VIAL_BARCODE, MESH_SIZE, MESH_SIZE_COMMENT, AGENCY, DNA_TRAY_WORKBENCH_ID, OTHER_AGENCY_KEY, OTO_INVENTORY_COMMENT, CO...
dttm (4): CAPTURE_DATE, EXHAUSTED_DATE, END_CAPTURE_DATE, COLLECTION_DATE
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Modify, filter for silly codes of interest and remove any fish with knwon missing tissues
(
loki_tissue <- loki_tissue_og %>%
dplyr::filter(
SILLY_CODE %in% c("PDUCK13"),
PK_TISSUE_TYPE == "Axillary Process",
is.na(IS_MISSING_PAIRED_DATA_EXISTS),
is.na(WELL_HAS_MORE_THAN_ONE_SAMPLE)
) %>%
dplyr::rename(
silly = SILLY_CODE,
fish_id = FK_FISH_ID,
tissue_type = PK_TISSUE_TYPE,
dwp_barcode = DNA_TRAY_CODE,
dwp_well = DNA_TRAY_WELL_CODE
) %>%
tidyr::unite(
col = "silly_source",
c(silly, fish_id),
sep = "_",
remove = FALSE
) %>%
dplyr::select(
silly,
fish_id,
silly_source,
tissue_type,
dwp_barcode,
dwp_well
)
)
Sex data lives here…
(loki_fish_og <- readr::read_csv(file = "../data/LOKI_tissue_GEN_SAMPLED_FISH_PDUCK13_20220322_104648.csv"))
Rows: 152 Columns: 27
-- Column specification -------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (2): SILLY_CODE, SEX
dbl (7): FK_COLLECTION_ID, FISH_ID, AGE_X, AGE_Y, YEAR_SAMPLED, SCALE_CARD_NUM, SCALE_CARD_POS
lgl (18): LENGTH, WEIGHT, SAMPLE_DATE_OLD, SAMPLE_DATE, STAT_WEEK_OLD, STAT_WEEK, PORT_CODE_OLD, PORT_CODE, STAT_AREA, DISTRICT_G...
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Modify, filter for silly codes of interest
(
loki_fish <- loki_fish_og %>%
dplyr::filter(
SILLY_CODE == "PDUCK13"
) %>%
dplyr::rename(
silly = SILLY_CODE,
fish_id = FISH_ID,
sex = SEX
) %>%
tidyr::unite(
col = "silly_source",
c(silly, fish_id),
sep = "_",
remove = TRUE
) %>%
dplyr::select(silly_source, sex)
)
Otolith read data lives here…
(oceanak_og <- readr::read_csv(file = "../data/Duck 2013 AHRP Salmon Biological Data 20220322_103611.csv"))
Rows: 304 Columns: 22
-- Column specification -------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (5): SILLY_CODE, TISSUE_TYPE, LOCATION_CODE, OTOLITH_MARK_PRESENT, OTOLITH_MARK_STATUS_CODE
dbl (8): COLLECTION_ID, FISH_ID, DNA_TRAY_CODE, DNA_TRAY_WELL_CODE, SAMPLE_ID, SAMPLE_YEAR, IS_MISSING_PAIRED_DATA_EXISTS, WELL_...
lgl (8): SEX, LENGTH_MM, OTOLITH_MARK_ID, TARGET_DNA_TRAY_CODE, TARGET_DNA_TRAY_WELL_POS, TARGET_CONTAINER_ARRAY_TYPE_ID, CONTAI...
dttm (1): SAMPLE_DATE
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Modify, filter for silly codes of interest and tissue type “Otolith”
(
oceanak <- oceanak_og %>%
dplyr::filter(
SILLY_CODE == "PDUCK13",
TISSUE_TYPE == "Otolith"
) %>%
dplyr::rename(
silly = SILLY_CODE,
fish_id = FISH_ID,
tissue_type = TISSUE_TYPE,
dwp_barcode = DNA_TRAY_CODE,
dwp_well = DNA_TRAY_WELL_CODE,
otolith_mark_present = OTOLITH_MARK_PRESENT
) %>%
tidyr::unite(
col = "silly_source",
c(silly, fish_id),
sep = "_",
remove = FALSE
) %>%
dplyr::select(
silly_source,
otolith_mark_present
)
)
Join all three of these data sources into something useful, then filter by our collections (Duck 13), non-missing tissue, natural-origin, split evenly across silly codes and sexes.
(join_duck_2013 <- loki_tissue %>%
dplyr::left_join(loki_fish, by = "silly_source") %>%
dplyr::left_join(oceanak, by = "silly_source")
)
From each silly, grab the first 18 fish from each sex that are not otolith marked (i.e. natural-origin fish). Total of 36 fish per silly, 36 fish per sampling unit.
(
extraction_duck_2013 <- join_duck_2013 %>%
dplyr::filter(otolith_mark_present == "NO",
sex != "U") %>%
dplyr::group_by(silly, sex) %>%
dplyr::slice_min(order_by = fish_id, n = 18) %>%
dplyr::ungroup() %>%
dplyr::arrange(silly, fish_id) %>%
dplyr::select(
silly_source,
silly,
fish_id,
dwp_barcode,
dwp_well,
tissue_type,
sex,
otolith_mark_present
)
)
Write it out for posterity
readr::write_csv(x = extraction_duck_2013, file = "../output/extraction_selection_PDUCK13.csv")
Need hatchery strays and natural-origin homing fish (i.e. those with natural-origin parents)
Split by sex, sampled throughout the season
Bad practice, but I’m going to grab output data from PWS-Pink-Parentage
(
all_streams_parents_paired_14_16_cross <-
readr::read_csv(file = "../../PWS Pink/GitHub-PWS-Pink-Parentage/Stockdale_Hogan_Gilmour_Paddy_Erb/all_streams_parents_paired_14_16_cross.csv")
)
Rows: 1360 Columns: 89
-- Column specification -------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (35): offspring_id, stream_off, origin_off, sex_off, intertidal_off, otolith_mark_present_off, silly_off, sample_off, dna_tr...
dbl (41): year_off, DOY_off, length_off, distance_mouth_off, distance_tide_off, fish_id_off, dna_tray_well_code_off, riverdist_s...
lgl (10): otolith_mark_id_off, pre_spawn_off, partial_spawn_off, preyed_upon_off, pre_spawn_sire, pre_spawn_dam, partial_spawn_s...
date (3): date_off, date_sire, date_dam
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
(
all_streams_parents_paired_14_16 <-
readr::read_csv(file = "../../PWS Pink/GitHub-PWS-Pink-Parentage/Stockdale_Hogan_Gilmour_Paddy_Erb/all_streams_parents_paired_14_16.csv")
)
Rows: 7988 Columns: 59
-- Column specification -------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (23): offspring_id, parent, parent_id, stream_off, origin_off, sex_off, intertidal_off, otolith_mark_present_off, silly_off,...
dbl (27): year_off, DOY_off, length_off, distance_mouth_off, distance_tide_off, fish_id_off, dna_tray_well_code_off, riverdist_s...
lgl (7): otolith_mark_id_off, pre_spawn_off, partial_spawn_off, preyed_upon_off, pre_spawn_par, partial_spawn_par, preyed_upon_par
date (2): date_off, date_par
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Filter for Stockdale and only
stockdale_2016_natural_selection_NN_cross <- all_streams_parents_paired_14_16_cross %>%
dplyr::filter(stream_off == stream_sire,
stream_off == "Stockdale",
origin_sire == origin_dam,
origin_sire == "Natural") %>%
dplyr::group_by(mating_id) %>%
dplyr::slice_sample(n = 1, replace = FALSE) %>%
dplyr::group_by(parent_id_sire) %>%
dplyr::slice_sample(n = 1, replace = FALSE) %>%
dplyr::group_by(parent_id_dam) %>%
dplyr::slice_sample(n = 1, replace = FALSE) %>%
dplyr::ungroup()
stockdale_2016_natural_selection_NN_cross %>%
dplyr::count(sex_off)
Modify for extraction format
(
stockdale_2016_natural_selection_NN_cross <-
stockdale_2016_natural_selection_NN_cross %>%
dplyr::rename(
silly = silly_off,
fish_id = fish_id_off,
dwp_barcode = dna_tray_code_off,
dwp_well = dna_tray_well_code_off,
sample_date = date_off,
sex = sex_off,
otolith_mark_present = otolith_mark_present_off
) %>%
dplyr::mutate(tissue_type = "Heart-bulbus arteriosus") %>%
tidyr::unite(
col = "silly_source",
c(silly, fish_id),
sep = "_",
remove = FALSE
) %>%
dplyr::arrange(silly, fish_id) %>%
dplyr::select(
silly_source,
silly,
fish_id,
dwp_barcode,
dwp_well,
tissue_type,
sample_date,
sex,
otolith_mark_present
)
)
Still need 1 more Male, pick from dyad data
(
stockdale_2016_natural_selection_N_dyad <-
all_streams_parents_paired_14_16 %>%
dplyr::rename(
silly = silly_off,
fish_id = fish_id_off,
dwp_barcode = dna_tray_code_off,
dwp_well = dna_tray_well_code_off,
sample_date = date_off,
sex = sex_off,
otolith_mark_present = otolith_mark_present_off
) %>%
dplyr::mutate(tissue_type = "Heart-bulbus arteriosus") %>%
tidyr::unite(
col = "silly_source",
c(silly, fish_id),
sep = "_",
remove = FALSE
) %>%
dplyr::filter(
stream_off == stream_par,
stream_off == "Stockdale",
origin_par == "Natural",
sex == "Male",
!(
silly_source %in% stockdale_2016_natural_selection_NN_cross$silly_source
),
!(
parent_id %in% all_streams_parents_paired_14_16_cross$parent_id_dam
),
!(
parent_id %in% all_streams_parents_paired_14_16_cross$parent_id_sire
)
) %>%
dplyr::group_by(sex) %>%
dplyr::slice_sample(n = 1, replace = FALSE) %>%
dplyr::ungroup() %>%
dplyr::arrange(silly, fish_id) %>%
dplyr::select(
silly_source,
silly,
fish_id,
dwp_barcode,
dwp_well,
tissue_type,
sample_date,
sex,
otolith_mark_present
)
)
stockdale_2016_natural_selection_N_dyad %>%
dplyr::count(sex)
Bind together, make sure no duplicates
(
extraction_PSTOCK16_natural <-
dplyr::bind_rows(
stockdale_2016_natural_selection_NN_cross,
stockdale_2016_natural_selection_N_dyad
) %>%
dplyr::arrange(silly, fish_id) %>%
dplyr::distinct()
)
Double check sample sizes
extraction_PSTOCK16_natural %>%
dplyr::count(silly)
extraction_PSTOCK16_natural %>%
dplyr::count(silly, sex)
Write it out for posterity
readr::write_csv(x = extraction_PSTOCK16_natural, file = "../output/extraction_PSTOCK16_natural.csv")
Need hatchery strays and natural-origin homing fish (i.e. those with natural-origin parents)
Split by sex, sampled throughout the season
Bad practice, but I’m going to grab output data from PWS-Pink-Parentage
(
stockdale_parents_paired_13_15 <-
readr::read_csv(file = "../../PWS Pink/GitHub-PWS-Pink-Parentage/Stockdale/stock_parents_paired_13_15.csv")
)
Rows: 129 Columns: 32
-- Column specification ------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (17): Offspring, Parent, Parent_ID, SILLY.off, Sample Date.off, SEX.off, Otolith Mark Present.off, origin.off, Sex.off, SIL...
dbl (12): Fish ID.off, DNA Tray Code.off, DNA Tray Well Code.off, Sample Year.off, Length Mm.off, DOY.off, Fish ID.par, DNA Tra...
lgl (1): Otolith Mark ID.off
date (2): date.off, date.par
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
There are is no triad data for the 2013/2015 pedigrees
(
stockdale_2015_natural_selection_N_dyad <-
stockdale_parents_paired_13_15 %>%
dplyr::rename(
silly = SILLY.off,
fish_id = "Fish ID.off",
dwp_barcode = "DNA Tray Code.off",
dwp_well = "DNA Tray Well Code.off",
sample_date = date.off,
sex = SEX.off,
otolith_mark_present = "Otolith Mark Present.off"
) %>%
dplyr::mutate(tissue_type = "Heart-bulbus arteriosus") %>%
tidyr::unite(
col = "silly_source",
c(silly, fish_id),
sep = "_",
remove = FALSE
) %>%
dplyr::filter(origin.par == "Natural") %>%
dplyr::group_by(sex) %>%
dplyr::slice_sample(n = 18, replace = FALSE) %>%
dplyr::ungroup() %>%
dplyr::arrange(silly, fish_id) %>%
dplyr::select(
silly_source,
silly,
fish_id,
dwp_barcode,
dwp_well,
tissue_type,
sample_date,
sex,
otolith_mark_present
)
)
stockdale_2015_natural_selection_N_dyad %>%
dplyr::count(sex)
Bind together, make sure no duplicates
(
extraction_PSTOCK15_natural <-
dplyr::bind_rows(
stockdale_2015_natural_selection_N_dyad
) %>%
dplyr::arrange(silly, fish_id) %>%
dplyr::distinct()
)
Double check sample sizes
extraction_PSTOCK15_natural %>%
dplyr::count(silly)
extraction_PSTOCK15_natural %>%
dplyr::count(silly, sex)
Write it out for posterity
readr::write_csv(x = extraction_PSTOCK15_natural, file = "../output/extraction_PSTOCK15_natural.csv")
Need hatchery strays and natural-origin homing fish (i.e. those with natural-origin parents)
Split by sex, sampled throughout the season
Bad practice, but I’m going to grab output data from PWS-Pink-Parentage
(
all_streams_parents_paired_14_16_cross <-
readr::read_csv(file = "../../PWS Pink/GitHub-PWS-Pink-Parentage/Stockdale_Hogan_Gilmour_Paddy_Erb/all_streams_parents_paired_14_16_cross.csv")
)
Rows: 1360 Columns: 89
-- Column specification ------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (35): offspring_id, stream_off, origin_off, sex_off, intertidal_off, otolith_mark_present_off, silly_off, sample_off, dna_t...
dbl (41): year_off, DOY_off, length_off, distance_mouth_off, distance_tide_off, fish_id_off, dna_tray_well_code_off, riverdist_...
lgl (10): otolith_mark_id_off, pre_spawn_off, partial_spawn_off, preyed_upon_off, pre_spawn_sire, pre_spawn_dam, partial_spawn_...
date (3): date_off, date_sire, date_dam
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
(
all_streams_parents_paired_14_16 <-
readr::read_csv(file = "../../PWS Pink/GitHub-PWS-Pink-Parentage/Stockdale_Hogan_Gilmour_Paddy_Erb/all_streams_parents_paired_14_16.csv")
)
Rows: 7988 Columns: 59
-- Column specification ------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (23): offspring_id, parent, parent_id, stream_off, origin_off, sex_off, intertidal_off, otolith_mark_present_off, silly_off...
dbl (27): year_off, DOY_off, length_off, distance_mouth_off, distance_tide_off, fish_id_off, dna_tray_well_code_off, riverdist_...
lgl (7): otolith_mark_id_off, pre_spawn_off, partial_spawn_off, preyed_upon_off, pre_spawn_par, partial_spawn_par, preyed_upon...
date (2): date_off, date_par
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Filter for Hogan and only
hogan_2016_natural_selection_NN_cross <- all_streams_parents_paired_14_16_cross %>%
dplyr::filter(stream_off == stream_sire,
stream_off == "Hogan",
origin_sire == origin_dam,
origin_sire == "Natural") %>%
dplyr::group_by(mating_id) %>%
dplyr::slice_sample(n = 1, replace = FALSE) %>%
dplyr::group_by(parent_id_sire) %>%
dplyr::slice_sample(n = 1, replace = FALSE) %>%
dplyr::group_by(parent_id_dam) %>%
dplyr::slice_sample(n = 1, replace = FALSE) %>%
dplyr::ungroup()
hogan_2016_natural_selection_NN_cross %>%
dplyr::count(sex_off)
This is dumb, very few NN crosses, just use the dyad data.
(
hogan_2016_natural_selection_N_dyad <-
all_streams_parents_paired_14_16 %>%
dplyr::rename(
silly = silly_off,
fish_id = fish_id_off,
dwp_barcode = dna_tray_code_off,
dwp_well = dna_tray_well_code_off,
sample_date = date_off,
sex = sex_off,
otolith_mark_present = otolith_mark_present_off
) %>%
dplyr::mutate(tissue_type = "Heart-bulbus arteriosus") %>%
tidyr::unite(
col = "silly_source",
c(silly, fish_id),
sep = "_",
remove = FALSE
) %>%
dplyr::filter(
stream_off == stream_par,
stream_off == "Hogan",
origin_par == "Natural",
!(
parent_id %in% all_streams_parents_paired_14_16_cross$parent_id_dam
),
!(
parent_id %in% all_streams_parents_paired_14_16_cross$parent_id_sire
)
) %>%
dplyr::group_by(sex) %>%
dplyr::slice_sample(n = 18, replace = FALSE) %>%
dplyr::ungroup() %>%
dplyr::arrange(silly, fish_id) %>%
dplyr::select(
silly_source,
silly,
fish_id,
dwp_barcode,
dwp_well,
tissue_type,
sample_date,
sex,
otolith_mark_present
)
)
hogan_2016_natural_selection_N_dyad %>%
dplyr::count(sex)
Bind together, make sure no duplicates
(
extraction_PHOGAN16_natural <-
dplyr::bind_rows(
hogan_2016_natural_selection_N_dyad
) %>%
dplyr::arrange(silly, fish_id) %>%
dplyr::distinct()
)
Double check sample sizes
extraction_PHOGAN16_natural %>%
dplyr::count(silly)
extraction_PHOGAN16_natural %>%
dplyr::count(silly, sex)
Write it out for posterity
readr::write_csv(x = extraction_PHOGAN16_natural, file = "../output/extraction_PHOGAN16_natural.csv")
Need hatchery strays and natural-origin homing fish (i.e. those with natural-origin parents)
Split by sex, sampled throughout the season
Bad practice, but I’m going to grab output data from PWS-Pink-Parentage
(
hogan_parents_paired_13_15 <-
readr::read_csv(file = "../../PWS Pink/GitHub-PWS-Pink-Parentage/Hogan/hogan_parents_paired_13_15.csv")
)
Rows: 110 Columns: 32
-- Column specification ------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (17): Offspring, Parent, Parent_ID, SILLY.off, Sample Date.off, SEX.off, Otolith Mark Present.off, origin.off, Sex.off, SIL...
dbl (12): Fish ID.off, DNA Tray Code.off, DNA Tray Well Code.off, Sample Year.off, Length Mm.off, DOY.off, Fish ID.par, DNA Tra...
lgl (1): Otolith Mark ID.off
date (2): date.off, date.par
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
There are is no triad data for the 2013/2015 pedigrees
(
hogan_2015_natural_selection_N_dyad <-
hogan_parents_paired_13_15 %>%
dplyr::rename(
silly = SILLY.off,
fish_id = "Fish ID.off",
dwp_barcode = "DNA Tray Code.off",
dwp_well = "DNA Tray Well Code.off",
sample_date = date.off,
sex = SEX.off,
otolith_mark_present = "Otolith Mark Present.off"
) %>%
dplyr::mutate(tissue_type = "Heart-bulbus arteriosus") %>%
tidyr::unite(
col = "silly_source",
c(silly, fish_id),
sep = "_",
remove = FALSE
) %>%
dplyr::filter(origin.par == "Natural") %>%
dplyr::group_by(sex) %>%
dplyr::slice_sample(n = 18, replace = FALSE) %>%
dplyr::ungroup() %>%
dplyr::arrange(silly, fish_id) %>%
dplyr::select(
silly_source,
silly,
fish_id,
dwp_barcode,
dwp_well,
tissue_type,
sample_date,
sex,
otolith_mark_present
)
)
hogan_2015_natural_selection_N_dyad %>%
dplyr::count(sex)
Bind together, make sure no duplicates
(
extraction_PHOGAN15_natural <-
dplyr::bind_rows(
hogan_2015_natural_selection_N_dyad
) %>%
dplyr::arrange(silly, fish_id) %>%
dplyr::distinct()
)
Double check sample sizes
extraction_PHOGAN15_natural %>%
dplyr::count(silly)
extraction_PHOGAN15_natural %>%
dplyr::count(silly, sex)
Write it out for posterity
readr::write_csv(x = extraction_PHOGAN15_natural, file = "../output/extraction_PHOGAN15_natural.csv")
For Stockdale and Hogan 2015 and 2016 Split by sex and hatchery (only AFK and WHN, no broodsource for CCH and too few VFDA) Random with respect to sample location and sample date
Otolith read data lives here…
(
oceanak_hatchery_og <-
readr::read_csv(file = "../data/Stockdale 2015-2016 Hogan 2015-2016 Spring 2014-2015 AHRP Salmon Biological Data 20220322_115608.csv")
)
Rows: 80100 Columns: 22
-- Column specification -------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (9): SILLY_CODE, SEX, TISSUE_TYPE, DNA_TRAY_CODE, LOCATION_CODE, SAMPLE_ID, OTOLITH_MARK_PRESENT, OTOLITH_MARK_ID, OTOLITH_M...
dbl (7): COLLECTION_ID, FISH_ID, LENGTH_MM, DNA_TRAY_WELL_CODE, SAMPLE_YEAR, IS_MISSING_PAIRED_DATA_EXISTS, WELL_HAS_MORE_THAN_O...
lgl (5): TARGET_DNA_TRAY_CODE, TARGET_DNA_TRAY_WELL_POS, TARGET_CONTAINER_ARRAY_TYPE_ID, CONTAINER_ARRAY_TYPE, DETERMINATION_COL...
dttm (1): SAMPLE_DATE
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Modify, filter for silly codes of interest and tissue type “Otolith”, and only AFK and WNH individuals
(
oceanak_hatchery <- oceanak_hatchery_og %>%
dplyr::filter(
SILLY_CODE %in% c("PSTOCK16", "PSTOCK15", "PHOGAN16", "PHOGAN15"),
TISSUE_TYPE == "Heart-bulbus arteriosus",
OTOLITH_MARK_PRESENT == "YES",
SEX != "U"
) %>%
dplyr::rename(
silly = SILLY_CODE,
fish_id = FISH_ID,
tissue_type = TISSUE_TYPE,
dwp_barcode = DNA_TRAY_CODE,
dwp_well = DNA_TRAY_WELL_CODE,
sample_date = SAMPLE_DATE,
sex = SEX,
otolith_mark_present = OTOLITH_MARK_PRESENT,
otolith_mark_id = OTOLITH_MARK_ID
) %>%
dplyr::mutate(
sample_date = lubridate::ymd(sample_date),
hatchery = stringr::str_sub(
string = otolith_mark_id,
start = 1,
end = 3
)
) %>%
dplyr::filter(hatchery %in% c("AFK", "WNH")) %>%
tidyr::unite(
col = "silly_source",
c(silly, fish_id),
sep = "_",
remove = FALSE
) %>%
dplyr::select(
silly_source,
silly,
fish_id,
dwp_barcode,
dwp_well,
tissue_type,
sample_date,
sex,
hatchery,
otolith_mark_present,
otolith_mark_id
)
)
How many?
oceanak_hatchery %>%
dplyr::count(silly, sex, hatchery)
From each silly, grab the first 18 fish from each sex that are not otolith marked (i.e. natural-origin fish). Total of 36 fish per silly, 36 fish per sampling unit.
(
extraction_hatchery_stray <- oceanak_hatchery %>%
dplyr::filter(otolith_mark_present == "YES",
sex != "U") %>%
dplyr::group_by(silly, sex, hatchery) %>%
dplyr::slice_sample(n = 9, replace = FALSE) %>%
dplyr::ungroup() %>%
dplyr::arrange(silly, fish_id) %>%
dplyr::select(
silly_source,
silly,
fish_id,
dwp_barcode,
dwp_well,
tissue_type,
sample_date,
sex,
hatchery,
otolith_mark_present,
otolith_mark_id
)
)
Double check
extraction_hatchery_stray %>%
dplyr::count(silly, sex, hatchery)
Missing 7 females and 8 males from PSTOCK16, WNH. Will replace with AFK fish.
(
extraction_hatchery_stray_PSTOCK16_AFK_female <- oceanak_hatchery %>%
dplyr::filter(silly == "PSTOCK16",
sex == "F",
hatchery == "AFK") %>%
dplyr::group_by(silly, sex, hatchery) %>%
dplyr::slice_sample(n = 7, replace = FALSE) %>%
dplyr::ungroup() %>%
dplyr::arrange(silly, fish_id) %>%
dplyr::select(
silly_source,
silly,
fish_id,
dwp_barcode,
dwp_well,
tissue_type,
sample_date,
sex,
hatchery,
otolith_mark_present,
otolith_mark_id
)
)
(
extraction_hatchery_stray_PSTOCK16_AFK_male <- oceanak_hatchery %>%
dplyr::filter(silly == "PSTOCK16",
sex == "M",
hatchery == "AFK") %>%
dplyr::group_by(silly, sex, hatchery) %>%
dplyr::slice_sample(n = 8, replace = FALSE) %>%
dplyr::ungroup() %>%
dplyr::arrange(silly, fish_id) %>%
dplyr::select(
silly_source,
silly,
fish_id,
dwp_barcode,
dwp_well,
tissue_type,
sample_date,
sex,
hatchery,
otolith_mark_present,
otolith_mark_id
)
)
Bind together, make sure no duplicates
(
extraction_hatchery_stray_final <-
dplyr::bind_rows(
extraction_hatchery_stray,
extraction_hatchery_stray_PSTOCK16_AFK_female,
extraction_hatchery_stray_PSTOCK16_AFK_male
) %>%
dplyr::distinct()
)
Double check sample sizes
extraction_hatchery_stray_final %>%
dplyr::count(silly)
extraction_hatchery_stray_final %>%
dplyr::count(silly, sex)
extraction_hatchery_stray_final %>%
dplyr::count(silly, sex, hatchery)
Write it out for posterity
readr::write_csv(x = extraction_hatchery_stray_final, file = "../output/extraction_selection_hatchery_stray.csv")
Spring Creek! Split by sex, sampled throughout the season
Otolith read data lives here…
(
oceanak_spring_og <-
readr::read_csv(file = "../data/Spring 2014-2015 AHRP Salmon Biological Data 20220322_122538.csv")
)
Warning: One or more parsing issues, see `problems()` for details
Rows: 25240 Columns: 22
-- Column specification -------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (8): SILLY_CODE, SEX, TISSUE_TYPE, DNA_TRAY_CODE, LOCATION_CODE, SAMPLE_ID, OTOLITH_MARK_PRESENT, OTOLITH_MARK_STATUS_CODE
dbl (7): COLLECTION_ID, FISH_ID, LENGTH_MM, DNA_TRAY_WELL_CODE, SAMPLE_YEAR, IS_MISSING_PAIRED_DATA_EXISTS, WELL_HAS_MORE_THAN_O...
lgl (6): OTOLITH_MARK_ID, TARGET_DNA_TRAY_CODE, TARGET_DNA_TRAY_WELL_POS, TARGET_CONTAINER_ARRAY_TYPE_ID, CONTAINER_ARRAY_TYPE, ...
dttm (1): SAMPLE_DATE
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Modify, filter for silly codes of interest and tissue type “Otolith”
(
oceanak_spring <- oceanak_spring_og %>%
dplyr::filter(
SILLY_CODE %in% c("PSPRIN14", "PSPRIN15"),
TISSUE_TYPE == "Heart-bulbus arteriosus",
OTOLITH_MARK_PRESENT == "NO",
SEX != "U"
) %>%
dplyr::rename(
silly = SILLY_CODE,
fish_id = FISH_ID,
tissue_type = TISSUE_TYPE,
dwp_barcode = DNA_TRAY_CODE,
dwp_well = DNA_TRAY_WELL_CODE,
sample_date = SAMPLE_DATE,
sex = SEX,
otolith_mark_present = OTOLITH_MARK_PRESENT
) %>%
dplyr::mutate(
sample_date = lubridate::ymd(sample_date)
) %>%
tidyr::unite(
col = "silly_source",
c(silly, fish_id),
sep = "_",
remove = FALSE
) %>%
dplyr::select(
silly_source,
silly,
fish_id,
dwp_barcode,
dwp_well,
tissue_type,
sample_date,
sex,
otolith_mark_present
)
)
From each silly, grab the first 18 fish from each sex that are not otolith marked (i.e. natural-origin fish). Total of 36 fish per silly, 36 fish per sampling unit.
(
extraction_spring <- oceanak_spring %>%
dplyr::filter(otolith_mark_present == "NO",
sex != "U") %>%
dplyr::group_by(silly, sex) %>%
dplyr::slice_sample(n = 18, replace = FALSE) %>%
dplyr::ungroup() %>%
dplyr::arrange(silly, fish_id) %>%
dplyr::select(
silly_source,
silly,
fish_id,
dwp_barcode,
dwp_well,
tissue_type,
sample_date,
sex,
otolith_mark_present
)
)
Double check
extraction_spring %>%
dplyr::count(silly, sex, otolith_mark_present)
Write it out for posterity
readr::write_csv(x = extraction_spring, file = "../output/extraction_selection_PSPRIN14_PSPRIN15.csv")
End…